Preparing lessons: Improve knowledge distillation with better supervision

نویسندگان

چکیده

Knowledge distillation (KD) is widely applied in the training of efficient neural network. A compact model, which trained to mimic representation a cumbersome model for same task, generally obtains better performance compared with being ground truth label. Previous KD-based works mainly focus on two aspects: (1) designing various feature knowledge transfer; (2) introducing different mechanism such as progressive learning or adversarial learning. In this paper, we revisit standard KD and observe that teacher’s logits might suffer from incorrect uncertain supervision. To tackle these problems, propose novel approaches deal respectively, are called Logits Adjustment (LA) Dynamic Temperature Distillation (DTD). be specific, LA rectifies according label certain rules. While DTD treats temperature dynamic sample wise parameter rather than static global hyper-parameter, actually notes uncertainty each sample’s logits. With iteratively updating temperature, student could pay more attention samples confuse teacher model. Experiments CIFAR-10/100, CINIC-10 Tiny ImageNet verify proposed methods yield encouraging improvement KD. Furthermore, considering simple implementations, can easily attached many frameworks bring improvements without extra cost time computing resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges for Better thesis supervision

Background: Conduction of thesis by the students is one of their major academic activities. Thesis quality and acquired experiences are highly dependent on the supervision. Our study is aimed at identifing the challenges in thesis supervision from both students and faculty members point of view. Methods : This study was conducted using individual in-depth interviews and Focus Group Discussi...

متن کامل

Apprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-

Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems — the models (often deep networks or wide net...

متن کامل

Apprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-

متن کامل

Apprentice: Using Knowledge Distillation Techniques to Improve Low-precision Net-

متن کامل

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2021

ISSN: ['0925-2312', '1872-8286']

DOI: https://doi.org/10.1016/j.neucom.2021.04.102